causal tree
- North America > United States > California (0.04)
- Europe > United Kingdom > England (0.04)
Sample Efficient Active Learning of Causal Trees
We consider the problem of experimental design for learning causal graphs that have a tree structure. We propose an adaptive framework that determines the next intervention based on a Bayesian prior updated with the outcomes of previous experiments, focusing on the setting where observational data is cheap (assumed infinite) and interventional data is expensive. While information greedy approaches are popular in active learning, we show that in this setting they can be exponentially suboptimal (in the number of interventions required), and instead propose an algorithm that exploits graph structure in the form of a centrality measure. If infinite interventional data is available, we show that the algorithm requires a number of interventions less than or equal to a factor of 2 times the minimum achievable number. We show that the algorithm and the associated theory can be adapted to the setting where each performed intervention yields finitely many samples. Several extensions are also presented, to the case where a specified set of nodes cannot be intervened on, to the case where $K$ interventions are scheduled at once, and to the fully adaptive case where each experiment yields only one sample. In the case of finite interventional data, through simulated experiments we show that our algorithms outperform different adaptive baseline algorithms.
LLM-based Agents for Automated Confounder Discovery and Subgroup Analysis in Causal Inference
Lee, Po-Han, Lin, Yu-Cheng, Ku, Chan-Tung, Hsu, Chan, Huang, Pei-Cing, Wu, Ping-Hsun, Kang, Yihuang
Estimating individualized treatment effects from observational data presents a persistent challenge due to unmeasured confounding and structural bias. Causal Machine Learning (causal ML) methods, such as causal trees and doubly robust estimators, provide tools for estimating conditional average treatment effects. These methods have limited effectiveness in complex real-world environments due to the presence of latent confounders or those described in unstructured formats. Moreover, reliance on domain experts for confounder identification and rule interpretation introduces high annotation cost and scalability concerns. In this work, we proposed Large Language Model-based agents for automated confounder discovery and subgroup analysis that integrate agents into the causal ML pipeline to simulate domain expertise. Our framework systematically performs subgroup identification and confounding structure discovery by leveraging the reasoning capabilities of LLM-based agents, which reduces human dependency while preserving interpretability. Experiments on real-world medical datasets show that our proposed approach enhances treatment effect estimation robustness by narrowing confidence intervals and uncovering unrecognized confounding biases. Our findings suggest that LLM-based agents offer a promising path toward scalable, trustworthy, and semantically aware causal inference.
- Asia > Taiwan > Takao Province > Kaohsiung (0.06)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area (0.47)
- Information Technology > Security & Privacy (0.46)
LILI clustering algorithm: Limit Inferior Leaf Interval Integrated into Causal Forest for Causal Interference
Dong, Yiran, Fan, Di, Gao, Chuanhou
Causal forest methods are powerful tools in causal inference. Similar to traditional random forest in machine learning, causal forest independently considers each causal tree. However, this independence consideration increases the likelihood that classification errors in one tree are repeated in others, potentially leading to significant bias in causal e ect estimation. In this paper, we propose a novel approach that establishes connections between causal trees through the Limit Inferior Leaf Interval (LILI) clustering algorithm. LILIs are constructed based on the leaves of all causal trees, emphasizing the similarity of dataset confounders. When two instances with di erent treatments are grouped into the same leaf across a su cient number of causal trees, they are treated as counterfactual outcomes of each other. Through this clustering mechanism, LILI clustering reduces bias present in traditional causal tree methods and enhances the prediction accuracy for the average treatment e ect (ATE). By integrating LILIs into a causal forest, we develop an e cient causal inference method. Moreover, we explore several key properties of LILI by relating it to the concepts of limit inferior and limit superior in the set theory. Theoretical analysis rigorously proves the convergence of the estimated ATE using LILI clustering. Empirically, extensive comparative experiments demonstrate the superior performance of LILI clustering.
- North America > United States (0.14)
- Asia > Middle East > Iran (0.05)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.92)
Causal Tree Extraction from Medical Case Reports: A Novel Task for Experts-like Text Comprehension
Yahata, Sakiko, Wan, Zhen, Cheng, Fei, Kurohashi, Sadao, Sato, Hisahiko, Nagai, Ryozo
Extracting causal relationships from a medical case report is essential for comprehending the case, particularly its diagnostic process. Since the diagnostic process is regarded as a bottom-up inference, causal relationships in cases naturally form a multi-layered tree structure. The existing tasks, such as medical relation extraction, are insufficient for capturing the causal relationships of an entire case, as they treat all relations equally without considering the hierarchical structure inherent in the diagnostic process. Thus, we propose a novel task, Causal Tree Extraction (CTE), which receives a case report and generates a causal tree with the primary disease as the root, providing an intuitive understanding of a case's diagnostic process. Subsequently, we construct a Japanese case report CTE dataset, J-Casemap, propose a generation-based CTE method that outperforms the baseline by 20.2 points in the human evaluation, and introduce evaluation metrics that reflect clinician preferences. Further experiments also show that J-Casemap enhances the performance of solving other medical tasks, such as question answering.
- Asia > Middle East > Jordan (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- (18 more...)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.46)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.31)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Reviews: Sample Efficient Active Learning of Causal Trees
The authors proposed a suite of algorithms for learning the structure of the causal graph under different assumptions (infinite and finite interventional sample, single vs. K intervention, non-manipulable variables). The assumption about the type of underlying causal graphs is quite stringent: a tree with no v-structure. Authors do not provide a compelling real-world example where this assumption makes sense. Nevertheless, this work seems to provide a theoretical insight to the very specific class of problems. Overall the paper is written clearly for readers to follow without any interruptions in general (there are some issues with how the paper is organized and I will talk about this below.)
Reviews: Sample Efficient Active Learning of Causal Trees
As pointed out by the reviewers, these are the strengths and weaknesses of the paper: STRENGTHS The paper proposes algorithms for learning causal trees with intervention data under various assumptions, including infinite observational and interventional data, finite interventional data, allowing K interventions, and limiting the tree nodes that can be intervened on. There is a theoretical analysis on the bounds for the number of required interventions. The paper is overall clearly written. FOR IMPROVEMENT The main concern about this paper is the applicability of the proposed algorithms since they focus only on very specific type of causal graphs (causal trees with no v-structure). The authors should discuss the significance of being able to learn such graphs.
Sample Efficient Active Learning of Causal Trees
We consider the problem of experimental design for learning causal graphs that have a tree structure. We propose an adaptive framework that determines the next intervention based on a Bayesian prior updated with the outcomes of previous experiments, focusing on the setting where observational data is cheap (assumed infinite) and interventional data is expensive. While information greedy approaches are popular in active learning, we show that in this setting they can be exponentially suboptimal (in the number of interventions required), and instead propose an algorithm that exploits graph structure in the form of a centrality measure. If infinite interventional data is available, we show that the algorithm requires a number of interventions less than or equal to a factor of 2 times the minimum achievable number. We show that the algorithm and the associated theory can be adapted to the setting where each performed intervention yields finitely many samples.
Causal Rule Forest: Toward Interpretable and Precise Treatment Effect Estimation
Hsu, Chan, Wu, Jun-Ting, Kang, Yihuang
Understanding and inferencing Heterogeneous Treatment Effects (HTE) and Conditional Average Treatment Effects (CATE) are vital for developing personalized treatment recommendations. Many state-of-the-art approaches achieve inspiring performance in estimating HTE on benchmark datasets or simulation studies. However, the indirect predicting manner and complex model architecture reduce the interpretability of these approaches. To mitigate the gap between predictive performance and heterogeneity interpretability, we introduce the Causal Rule Forest (CRF), a novel approach to learning hidden patterns from data and transforming the patterns into interpretable multi-level Boolean rules. By training the other interpretable causal inference models with data representation learned by CRF, we can reduce the predictive errors of these models in estimating HTE and CATE, while keeping their interpretability for identifying subgroups that a treatment is more effective. Our experiments underscore the potential of CRF to advance personalized interventions and policies, paving the way for future research to enhance its scalability and application across complex causal inference challenges.
- North America > United States (0.14)
- Asia > Taiwan > Takao Province > Kaohsiung (0.05)
- Research Report > New Finding (0.68)
- Research Report > Promising Solution (0.68)
- Overview > Innovation (0.54)